Add Gemini provider and Windows port (M1 foundation)#71
Open
PsychoSatsujin wants to merge 11 commits intofarzaa:mainfrom
Open
Add Gemini provider and Windows port (M1 foundation)#71PsychoSatsujin wants to merge 11 commits intofarzaa:mainfrom
PsychoSatsujin wants to merge 11 commits intofarzaa:mainfrom
Conversation
Two steps toward a multi-provider, cross-platform Clicky. Gemini alongside Claude - Worker gains POST /chat-gemini. Model ID travels in the request body; the Worker plugs it into the upstream Gemini URL path. - GeminiAPI.swift mirrors ClaudeAPI's streaming signature so the call sites don't care which provider is active. CompanionManager gets a runStreamingVisionRequest dispatcher and an isGeminiModelID helper; setSelectedModel updates whichever provider owns the new ID. - Panel picker gains a Gemini row (Flash default, Pro option). Flash is the default because the motivation here is reducing credit spend. Windows port (Milestone 1 of 6) - New windows/ folder with a C# + WPF solution on .NET 8. Single-instance guarded, tray icon via H.NotifyIcon.Wpf, borderless non-activating popover panel matching the macOS design system 1:1 (colors and radii ported verbatim from DesignSystem.swift), global Ctrl+Alt push-to-talk via low-level keyboard hook, settings persisted to %APPDATA%\Clicky. - No voice pipeline yet — pressing Ctrl+Alt flips AppState so the hook is verifiable. M2 (mic + AssemblyAI + AI + TTS), M3 (screen capture), M4 (cursor overlay), M5 (element pointing), and M6 (onboarding) land in follow-up PRs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end push-to-talk flow on Windows: mic capture, AssemblyAI v3 streaming transcription, Claude/Gemini SSE chat, ElevenLabs TTS playback. Text-only (no screenshots yet — that's M3). Services (windows/Clicky/Services/): - WorkerConfig.cs Cloudflare Worker base URL + route constants - IChatClient.cs provider-agnostic streaming chat interface - ClaudeClient.cs /chat SSE (port of ClaudeAPI.swift) - GeminiClient.cs /chat-gemini SSE (port of GeminiAPI.swift) - AssemblyAIStreamingClient v3 realtime WebSocket transcription - MicrophoneCaptureService NAudio WaveInEvent @ 16 kHz PCM16 mono - ElevenLabsTtsClient /tts MP3 fetch + NAudio Mp3FileReader playback - DictationSession mic ↔ AssemblyAI bridge, 2.8s finalize fallback - VoicePipelineOrchestrator end-to-end press→listen→process→respond→idle Wire-up: - AppState gains LiveTranscript, StreamedResponseText, LastStatusMessage - TrayPanelViewModel exposes AppState so the panel can bind directly - TrayPanelWindow.xaml renders live transcript + streaming response rows, collapsing via a new StringToVisibilityConverter - App.xaml.cs hands Ctrl+Alt press/release to the orchestrator and tears it down cleanly on exit - Clicky.csproj adds NAudio 2.2.1 Housekeeping: - .gitignore excludes windows/**/bin|obj/ and *.user - windows/README.md marks M2 complete and documents WorkerConfig edit step Build verified: dotnet build → 0 errors (6 pre-existing M1 warnings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds GDI BitBlt-based ScreenCaptureService that enumerates every attached monitor, captures PerMonitorV2-aware JPEGs (quality 80, downscaled to 1280px longest side), and orders them cursor-first to match the macOS CompanionScreenCaptureUtility contract. InlineImage gains an optional label so ClaudeClient and GeminiClient can emit a text part before each image, giving the model the "screen N of M — cursor is on this screen (primary focus) (image dimensions: WxH pixels)" context macOS already uses. VoicePipelineOrchestrator captures every push-to-talk release, feeds the labeled JPEGs to the selected provider, and strips the trailing [POINT:…] tag before TTS speaks the reply (M4/M5 will start consuming it). System prompt is now the verbatim macOS companionVoiceResponseSystemPrompt so prompt-engineering tweaks there translate directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds one click-through topmost OverlayWindow per connected display (WS_EX_TRANSPARENT | WS_EX_LAYERED | WS_EX_NOACTIVATE | WS_EX_TOOLWINDOW), with a 16-DIP equilateral blue triangle (#3380FF, rotated -35°, soft blue glow) that follows the system cursor at 60 fps. Only the overlay on the cursor's monitor renders the triangle; the rest stay hidden so nothing flickers between displays. OverlayWindowManager owns the per-monitor lifecycle and the DispatcherTimer that polls GetCursorPos every 16 ms. Triangle visibility follows AppState.CurrentVoiceState so the triangle only shows during Idle / Responding — Listening and Processing hide it, leaving room for the waveform and spinner that will land in M5/M6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parses [POINT:x,y:label:screenN] tags out of each reply, rescales the screenshot coords to the monitor's native device pixels, and flies the overlay triangle along a quadratic bezier arc (smoothstep easing, tangent-based rotation, scale pulse) to the target. On arrival a blue speech bubble spring-bounces in with a streamed random phrase, holds 3 s, fades, then the triangle flies back to the cursor. Port of the macOS OverlayWindow.animateBezierFlightArc + streamBubbleText flow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the three remaining parity surfaces: - MicrophonePermissionHelper probes for an active capture endpoint at startup and offers a one-click shortcut to ms-settings:privacy-microphone when Windows privacy settings are blocking the mic. The tray panel shows an inline callout whenever AppState.IsMicrophonePermissionIssue is set. - First-run onboarding: if HasCompletedOnboarding is false the panel auto-opens centered on the primary monitor with a welcome block and a "Get started" button that flips the flag. A "Watch welcome again" footer link replays it. - ClickyAnalytics POSTs directly to PostHog /capture/ with the same event surface as the macOS ClickyAnalytics.swift (app_opened, onboarding_*, permission_*, push_to_talk_*, user_message_sent, ai_response_received, element_pointed, response_error, tts_error), keyed by a stable anonymous distinct_id persisted in settings.json. Write key placeholder in WorkerConfig.cs — every event is silently dropped until it's swapped for a real key. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Install-Clicky.bat double-clicks into Install-Clicky.ps1 (runs per-user with ExecutionPolicy Bypass, no admin needed). The PowerShell installer publishes a self-contained single-file Clicky.exe to %LOCALAPPDATA%\Programs\Clicky, creates Start Menu + Desktop shortcuts (minimised since Clicky is a tray app), registers Clicky in Apps & Features via the HKCU Uninstall key, optionally adds the HKCU Run entry so it launches on login, and drops a self-contained Uninstall-Clicky.ps1 next to the exe. -FrameworkDependent, -NoAutoStart, and -NoLaunch switches let power users opt out of the defaults. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace em-dashes and Unicode arrows in Install-Clicky.ps1 with ASCII
equivalents. PS 5.1 reads .ps1 files as ANSI by default, so the UTF-8
multi-byte sequences were mis-decoded and broke string parsing ("Missing
closing '}'", "Unexpected token ')'", "The string is missing the
terminator"). Verified with [Parser]::ParseFile -- no parse errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Configure-Worker.bat double-clicks into Configure-Worker.ps1, which reads
API keys from worker/.secrets.local (KEY=VALUE per line, gitignored), runs
`wrangler login` if needed, pipes each populated value to `wrangler secret
put` so secrets never appear on a command line, and finishes with
`wrangler deploy`. .secrets.local.example is the template the script
auto-copies on first run; the real .secrets.local is gitignored so keys
cannot be committed accidentally. Script also drops a Desktop shortcut
("Configure Clicky Worker") on first run for one-click reconfiguration
after editing keys. -SkipDeploy and -NoShortcut switches let advanced
users opt out of the defaults.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. TrayPanelWindow.xaml: move the Grid.Resources block (which declares
StringToVisibilityConverter, BooleanToVisibilityConverter, and the
button styles) from the bottom of the root Grid to the top. WPF
resolves {StaticResource ...} markup extensions at parse time and does
not support forward references within the same XAML scope, so when the
StackPanels at the top of the Grid referenced the converters declared
below them, parsing crashed with "Cannot find resource named
'BooleanToVisibilityConverter'". Hitting it required actually opening
the panel, which is why this slipped past `dotnet build`. Resources
now precede their use sites.
2. App.xaml.cs: tray-icon fallback was returning a RenderTargetBitmap to
H.NotifyIcon.TaskbarIcon.IconSource. The library's ToStreamAsync helper
only handles a narrow set of ImageSource subtypes (BitmapImage with
UriSource, BitmapFrame from a Uri, etc.) and threw NotImplementedException
on RenderTargetBitmap; an interim BitmapImage-from-MemoryStream attempt
then tripped a NullReferenceException because the library tried to read
the absent UriSource. Bypass that whole conversion path by setting
TaskbarIcon.Icon (System.Drawing.Icon) directly: real .ico resources
load via `new Icon(stream)`, and the fallback "blue dot" placeholder is
built with a 32x32 GDI Bitmap + GetHicon. Verified the app now stays
running stably with the tray icon visible.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`npm audit fix --force` upgraded wrangler from ^3.0.0 to ^4.85.0, which transitively replaces the vulnerable esbuild, undici, and miniflare versions flagged by GHSA-67mh-4wv8-2f99 (esbuild dev-server SSRF), GHSA-g9mf-h72j-4rw9 (undici unbounded decompression), GHSA-2mjp-6q6p-2qxm (HTTP smuggling), GHSA-vrm6-8vpv-qv8q (WS permessage-deflate memory), GHSA-v9p9-hfj2-hcw8 (invalid server_max_window_bits handling), and GHSA-4992-7rv2-5pvq (undici upgrade-option CRLF injection). `npm audit` now reports 0 vulnerabilities. Verified wrangler.toml still parses under v4 via `wrangler deploy --dry-run` (Worker uploads as 5.21 KiB, ELEVENLABS_VOICE_ID env binding intact). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/chat-geminiroute on the Cloudflare Worker,GeminiAPI.swiftmirroringClaudeAPI's streaming interface, and provider dispatch inCompanionManager. Model picker now shows Claude (Sonnet/Opus) and Gemini (Flash/Pro).windows/). Tray icon, borderless non-activating popover, global Ctrl+Alt push-to-talk via low-level keyboard hook, settings persistence, design-system parity (DesignSystem.xaml). Zero embedded secrets — shares the same Worker proxy.AGENTS.mdupdated to cross-platform framing;windows/README.mdcovers build steps and milestone roadmap (M2 voice → M3 capture → M4 overlay → M5 pointing → M6 polish).Test plan
dotnet run --project windows/Clicky, confirm tray icon, click opens popover near tray, Ctrl+Alt fires press/release events, settings persist to%APPDATA%\Clicky\settings.json.🤖 Generated with Claude Code